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Abstract. The inclusion of a threshold in the dynamics of layered neural networks 
with variable activity is studied at arbitrary temperature. In particular, the effects 
on the retrieval quality of a self-controlled threshold obtained by forcing the neural 
activity to stay equal to the activity of the stored patterns during the whole retrieval 
process, are compared with those of a threshold chosen externally for every loading 
and every temperature through optimization of the mutual information content of the 
network. Numerical results, mostly concerning low activity networks are discussed. 

PACS numbers: 64.60Cn, 87.10+e, 02.50-r 
1. Introduction 

Recently, the introduction of a threshold in the dynamics of neural networks with low 
activity is discussed again by several authors ffl, 0, |3| (and references therein). Diluted 
models Q, ||] and models for sequential patterns J| have been looked at. In all cases 
it is found that the retrieval quality - overlap, basin of attraction, critical storage 
capacity, information content - depends on the methods of activity control employed. 
New insights in the dynamical properties of these models have been obtained and new 
suggestions have been put forward for the choice of threshold functions in order to get 
enhanced retrieval. In this context it is interesting to study low activity (or in other 
words sparsely coded) layered neural networks because as is common knowledge by now, 
exactly these models are used in many applications in several areas of research. 

Sparsely coded models have a very large storage capacity behaving as l/(alna) in 
the limit a going to 0, where a is the pattern activity (see, e.g., ||, ^ @, [7| and references 
therein). However, for low activity the basins of attraction might become very small and 
the information content in a single pattern is reduced. For the models mentioned above 
these drawbacks can be avoided and an optimal retrieval performance can be reached 
by introducing an appropriate threshold in the dynamics [I], |2], |3|, [7|]. 

In the layered models discussed in the sequel we follow two different approaches. 
The first one consists in forcing the neural activity to be the same as the activity of 
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the stored patterns during the whole retrieval process. In order to guarantee this we 
introduce a time- dependent threshold in the dynamics chosen as a function of the noise 
and the pattern activity in the network and adapting itself autonomously in the course 
of the time evolution. This is the self-control method proposed in ||. 

The second approach chooses a threshold by optimising the information content of 
the network since for very small pattern activities the number of active neurons and 
the information represented by a single pattern decreases. The relevant quantity we 
use here is the mutual information function §, || and the threshold will be called 
the optimal threshold. Here the threshold is time-independent and externally chosen for 
every loading and every temperature. Both methods are compared for zero and non-zero 
temperatures for networks with various activities. 

The rest of this paper is organised as follows. In Section 2 we introduce the 
layered network model and define the relevant order parameters. Section 3 presents the 
dynamical evolution equations for these order parameters obtained by the probabilistic 
signal-to-noise ratio analysis. In Section 4 we discuss the different threshold mostly 
in the context of low activity. In Section 5 we present numerical results at zero and 
non-zero temperatures. Finally we end with some concluding remarks in Section 6. 

2. The model 

Consider a neural network composed of binary neurons arranged in layers, each layer 
containing N neurons. A neuron can take values o~i(t) G {0,1} where t = 1,...,L 
is the layer index and % = 1, . . . , N labels the site. Each neuron in layer t is 
unidirectionally connected to all neurons on layer t + 1. We want to store p = aN 
patterns {£f (t)}, i — 1, . . . , N, \i — 1, . . . ,p on each layer t, taking the values {0, 1}. 
They are assumed to be independent identically distributed random variables (i.i.d.r.v.) 
with respect to i, fi and t, determined by the probability distribution: p(£f (£)) = 
a<5(£f (t) — 1) + (1 — a)<5(£f (t)). From this form we find that the expectation value and the 
variance of the patterns are given by E[g(t)] = E[g(t) 2 ] = a . Moreover, no statistical 
correlations occur, in fact for /i ^ v the covariance vanishes: Cov(£f (t), £f(t)) = 
E[g(t)g(t)] - E[g(t)]E[g(t)] = . In the sequel it will be convenient to make the 
change of variables r/f (t) = £•* (t) — a such that the interesting expectation values are 
E[rtf(t)] = and E[rft [tf] = a{a - 1) = o . 

The state o~i(t + 1) of neuron i on layer t + 1 is determined by the state of the 
neurons on the previous layer t according to the stochastic rule 



The parameter (3 = 1/T controls the stochasticity of the network dynamics, it measures 
the noise level. Given the configuration {<jj(t)}; i = 1, . . . , N on layer t, the local field 
hi (t) in site i on the next layer t + 1 is given by 



P{a t {t + 1) | <xi (£),..., a N (t)) = {1 + exp[2(2 ( r l (t + 1) - l)^(t)]} 



i 



(1) 



N 



hi(t) = J £j ij (t){a i (t)-a)-e(t) 



(2) 
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with 6 {t) the threshold to be specified later. The couplings Jij(t) are the synaptic 
strengths of the interaction between neuron j on layer t and neuron i on layer t + 1. 
They depend on the stored patterns at different layers according to the covariance rule 

1 N 

Mt) = jj~ E(tf (* + !) - «)(#(*) - Q ) • ( 3 ) 

These couplings then permit to store sets of patterns to be retrieved by the layered 
network. We remark that in the limit T — > the updating rule ([!]) reduces to the 
deterministic form 

a l (t + l) = Q(h l (t)) (4) 

where Q(x) is the standard step function taking the value {0, 1}. 

We take parallel updating. The dynamics of this network is defined as follows 
(see [|H], [□]] and references therein). Initially the first layer (the input) is externally 
set in some fixed state. In response to that, all neurons of the second layer update 
synchronously at the next time step, according to the stochastic rule (|1|), and so 
on. Layered feed-forward networks allow an exact analytic treatment of their parallel 
dynamics stemming from the independent choice of the representations of the patterns 
on different layers. By exact analytic treatment we mean that, given the configuration of 
the first layer as initial state, the configuration on layer t that results from the dynamics 
is predicted by recursion formulas for the relevant order parameters. This configuration 
is known through the calculation of macroscopic quantities obtained by averaging over 
the thermal noise associated with the dynamics, as well as over the random choice of 
the stored patterns. 

The relevant order parameters measuring the quality of retrieval are the main 
overlap of the microscopic state of the network and the /x-th pattern, and the neural 
activity of the neurons 

i N i N 

M N(t) = T~£tf -a), q N (t) = TtE^W • (5) 

lyCL i=l ly i=l 

These order parameters determine the Hamming distance between the state of the 
network and the pattern {£f (t)} 

1 N 

d H (e(t),a(t)) = -^nt)-^(t)] 2 ■ (6) 

ly i=l 

It is known that the Hamming distance is a good measure for the performance of a 
network when the neural activity a ~ 1/2. For low activity networks, however, it does 
not give a complete description of the information content ||. Therefore, the mutual 
information function I(ai(t); £i(t)) has been introduced [||. [| 

where £f (t) is considered as the input and <Tj(t) as the output with S(<Ti(t)) its entropy 
and S(ai(t)\£i(i)) its conditional entropy, viz. 

s(a t (t)) = -Y.p(°M)Hp(^(t))} (8) 
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s(vi(t)\zm) = -T,p^mnt))Hp^mm)] ■ (9) 

a 

Here p(o~i(t)) denotes the probability distribution for the neurons at time t and 
p{cri{t)\£i(t)) indicates the conditional probability that the i-th neuron is in a state 
<Ji(t) at time t given that the i-th site of the stored pattern to be retrieved is £f (t). 



3. Dynamics at arbitrary temperature 

We suppose that the initial configuration {crj(l)} is a collection of i.i.d.r.v. with average 
and variance given by _E[<Xj(l)] = E[(o~i(l)) 2 } = q . We furthermore assume that this 
configuration is correlated with only one stored pattern, say pattern [1 = 1, such that 

Cov(e(l),a,(l))=^, 1 M 1 a. (10) 

We then obtain the order parameters fl5|) at the initial time step t = 1 in the 
thermodynamic limit by the law of large numbers (LLN). For the main overlap we 
have 

= lira M&(1) L =* -£[^(1)^(1) - a)} = ~Cov(£f (1), ^(1)) = S^M] (11) 
n^qo a a 

and for the neural activity 

q(l)= lim q N {l) L ^ N E[a l {l)] = q . (12) 

The evolution equations governing the dynamics are then obtained following the 
methods based upon a signal-to-noise analysis of the local field (see, e.g., |TU|]-[I5| for 
the case without threshold and without bias, i.e., a = 1/2). The local field is split as 
the sum of a signal (from the condensed pattern /i = 1) and a noise (from the non- 
condensed patterns \i > 1). For a recent overview comparing various architectures we 



refer to | iq| . Since the method is standard by now we only write down the final results. 



At zero temperature we obtain for a general time step 

M \t + 1) = i - 1 { eA I it -^m-m) + erfc ( awy+m) , (13) 



2aD(t) J V \/2aD(t) 

q (t + l) = aM\t + l) + leJ aM ^ + ^) (14) 

^ 1 f / (Bit) - (1 - a)M x ) 2 \ 

D{t + 1) = Q{t + l) + — jaexp 2 ^ )a ' j 

/ (0(t) + aM 1 ) 2 \l 2 , . 

+ < 1 -)«P( - 2 D(t)a '- )] < 15 > 

where Q(t) = (1 -2a)q(t) + a 2 and D{t) is the variance of the residual overlap containing 
the influence of the non-condensed patterns \i > 1. The residual overlap is defined as 

1 N 

r&(t) = - 7 = y £rf(t)(a i (t)-a), fi>l, (16) 
v JM a 
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and causes the intrinsic noise in the dynamics of the main overlap M 1 (t). Finally, 

For non-zero temperatures thermal averages denoted by (• • •) have to be taken in 
agreement with the distribution (j^) such that 



and 



(a. J (t+l)) = -[l + tanh(/3(/ ii (t)))] 



M»{t) = lim (M&(t)) , q(t) = lim (q N (t)) 

N-^oc N— >oo 



;i7) 



The stochastic dynamics can then be described through the following equations for the 
order parameters 
1 



M l {t + 1) = - <| / Vx ta.nl) 



+ / T>a;tanh 



P((l - a)M 1 (t) - 9{t) + JaD(t)x) 



(19) 



Pi-aM 1 ^) - 6(t) + yJaD(t) x) 
q{t + 1) = aM\t + 1) + - |l + Jvxt&nh ^(-aM 1 ^) - 9{t) + \jaD(t) x) J (20) 
D(t+1) =Q(t + l) + ||l-a J Vx tanh 2 (3 (l-a)M 1 (t) - 6{t) + y/aD(t) x 
- (1-a) J £>xtanh 2 /3 -aM l {t) - 6»(t) + ^JaD(t)x 
where Vx is the Gaussian measure Vx = dx(2 r n)^ 1 ^ 2 exp(— x 2 /2). 



(21) 



4. Thresholds 

4-1. Low activity and self-control 

In the limit of low activity it has been emphasized already in the study of extremely 
diluted and fully connected architectures that one should try to keep the pattern activity 
of the network during the retrieval process the same as the one for the memorized 
patterns [|T], [| 0, |I7|, 0, ^(J. Also for the layered model considered here one easily 
finds for fixed a and zero threshold by using eqs. (^)-([T^) that in the limit a — > the 
neural activity behaves as q{t) ~ ~ + aM l {t) and always tends to 1/2. The way to avoid 
this is to choose, given a, the capacity a such that aM l {t) < 



'2aD(t) but this means 
that when a decreases the critical capacity a c is going to decrease too. In fact, numerical 
experiments on the layered model show that for 6 = and a ~ 10~ 3 , a c ~ 10~ 4 . Similar 
considerations stay valid for non-zero temperatures. 

Therefore, in the retrieval process we need to control the neural activity and keep 
it, at each layer, the same as the one for the stored patterns: q(t) = a. For a network 
with low activity this requires the introduction of a threshold 6(t) in the definition of 
the local field (Q). For the extremely diluted model a time-dependent threshold has been 
chosen in as a function of the noise in the system and the pattern activity, adapting 
itself in the course of the time evolution. The novel idea was to let the network itself 
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autonomously counter the residual noise at each time step of the dynamics without 
having to impose any external constraints. 

In the following we start from an analogous general form for this self-control 
threshold 



6{t) sc = c{a)^/aD(t) (22) 
where we recall that D(t) is the variance of the noise contribution in the local field. 



For the determination of c(a) we consider the form fll4|) for the layered architecture and 
require that the term 

erfc ( aM ] W + <M) = erfc ( + 0&) „ L^L (23) 




must vanish faster than a. This can be realised by choosing c(a) = V —2 In a. We 



furthermore remark that in the low activity limit the recursion relation (|T^) for D(t + 1) 
leads to D(t + 1) ~ Q(t + 1). This shows explicitly that in this limit the result for 
the layered model is similar to the one for the extremely diluted model ||. Indeed, we 
intuitively expect that in the limit of low activity all models roughly behave in the same 
way. 

The line of arguments above is also valid at arbitrary temperature. In the limit of 
low activity it is straightforward to show that the second term on the r.h.s. of eq. (EH) 
vanishes faster than the activity a. 

We recall that this self-control threshold (|22|) is a macroscopic parameter, thus no 
average must be taken over the microscopic random variables at each time step t. We 
have in fact a mapping with a threshold changing each time step, but no statistical 
history intervenes in this process. 

In the next section we study explicitly the influence of this threshold on the retrieval 
quality of the network dynamics. For the extremely diluted model |], 3 and in the case 
of sparsely coded sequential patterns it has been shown that this retrieval quality is 
considerably improved for low activity. In the case of the extremely diluted model this 
improvement also works for not so low activity [[|. Furthermore, although the form of 
the threshold has been derived at zero temperature, we also want to find out whether 
it works at finite temperatures. 



4-2. Optimising the mutual information 

We have argued that the mutual information function ([?]) is a better concept than the 
Hamming distance in order to measure the retrieval quality especially in the limit of 
low activity. So, a second type of threshold we introduce is obtained by optimising this 
mutual information. 

We start by calculating the mutual information for the case at hand using the 
eqs. (0), (§) and (Q). In the sequel we drop the index t. Because of the mean- field 
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character of our model the following formula hold for every site index i on each layer t. 
After some algebra we find for the conditional probability 

p{*\0 = [7o£ + (7i - loKM* - 1) + [1 - 7o - (71 - 7o)£]%) (24) 
where 70 = q — aM l and 71 = (1 — a)M x + q, and where the M 1 and q are precisely the 
order parameters (|5|) for iV — > 00. Using the probability distribution of the patterns we 
obtain 

p(cr) = q5(a - 1) + (1 - q)S(a) . (25) 
Hence the entropy (||) and the conditional entropy (Q) become 

S(o-) = -qlnq- (1 - g) ln(l - g) (26) 

S (°\0 = - [7o + (7i - 7o)f] ln[7o + (7i - 7o)f] 

- [1 - 70 - (71 - 70)^] ln[l - 70 - (71 - 7o)f ] • (27) 
By averaging the conditional entropy over the pattern £ we get 

(S(*\t))i = -a[ 7 i In 71 + (1 - 7i) Ml - 71)] - (1 - a) [70 1*70 + (1 - 7o) - 7o)](28) 
such that the mutual information function (0) for the layered model is given by 
I(cr;£) = -qlnq - (1 - q) ln(l - q) + a [71 In 71 + (1 - 71) ln(l - 71)] 

+ (1- a) [70 In 70 + (I-70) Ml -To)] • (29) 
At time t the mutual information function depends on the main overlap M l (t), the neural 
activity q(t), the pattern activity a, the storage capacity a and the inverse temperature 



(3. The evolution of the main overlap and of the neural activity (eqs. fll3|) , ( 14]) for zero 
temperature and flHf), (|20|) for arbitrary temperature) depends on the specific choice of 
the threshold in the definition of the local field @. We consider a time- independent 
threshold 9(t) = 6 and calculate the value of ( p9|) at equilibrium for fixed a, a, M , go 
and (3. The optimal choice for this threshold chosen at equilibrium, 6 = 8 opt , is then the 
one for which the mutual information function is maximal. 



5. Results 



We have studied the retrieval properties for the layered model with 6 SC and 6 opt by 
numerically solving the recursion relations derived in section 3 with an activity ranging 
from a = 0.001 to a = 0.3 at various inverse temperatures (3 = 3, 4, 5, 10, 100, 00. We 
are interested only in the retrieval solutions with M 1 > (in the sequel we drop the 
superindex 1) and carrying a non-zero information /. The results for zero and non- 
zero temperature have been analysed separately. Our main aim is to study how self- 
control introduced for extremely diluted networks also works for other models, in casu 
a layered architecture at zero temperature, as claimed in , and to check whether such 
a threshold can still be useful at non-zero temperatures. Moreover, we compare this 
self-control method, which is mainly designed for low activity but also works for higher 
activities, with the optimization method. Since the latter works for all values of the 
activity although it has to be found externally for every loading and every temperature. 
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5.1. Zero temperature 

In fig.|l] we have plotted the information content i = al as a function of 9 without 
self-control or a priori optimization for pattern activity a = 0.01 and different values of 
the storage capacity a. For every value of a, below its critical value, there is a range 
for the threshold where the information content is different from zero. For any choice of 
the threshold in this range retrieval is possible. This retrieval range becomes very small 
when the capacity approaches its critical value a c = 4.72. 

Defining the basin of attraction as the range of initial values M e [0, 1] which 
lead to the retrieval attractor M(t) ~ 1, we remark at this point that the size of this 
basin strongly depends on the specific choice of the threshold in the retrieval range. 
Technically it turns out that the value to be chosen for the latter in order to have the 
largest basin is the minimal 9 in the retrieval range. This, of course, has to be repeated 
for every a. This threshold optimises the information content and is called, as specified 
before, 9 opt . 

Figure represents the dynamical evolution of the network. The retrieval overlap 
M(t) is shown as a function of time for different initial values M , go = 0.001 = a 
and a = 25. A self-control threshold 9 SC = [— 2(ln a)aQ(t)]~ 1 ^ 2 (fig. |2(a)| ) is compared 



with an optimal threshold 9 opt (fig. |2(b)[) concerning the values of the minimal M for 



retrieval, the fixed-point M* and the critical capacity a c . It is seen that self-control 
works better than optimization and both much better than a zero threshold (where 
there is no retrieval at all since a c = 5.3 x 10~ 5 only). This can be interpreted as a 
result of the property of adaptivity in the course of the time evolution inherent in the 
self-control method. 

In fig.^| the retrieval phase diagram is illustrated for a = 0.001 and go — o,. In 
the low activity limit the basin of attraction is substantially improved by self-control 
even near the border of the critical storage. Hence, also the storage capacity is larger 
with self-control. Furthermore, we have compared these curves with the one for a model 
without threshold in the low activity limit. Since we find a very small storage capacity 
(of order 10~ 4 ) such a network without threshold has very little interest. 

Plotting the retrieval fixed-point M* as a function of a we have found a first order 
transition from the retrieval phase (M* > 0) to the non- retrieval one (M* = 0), see fig. f| 
for different values of a. We remark that the curves for a = 0.001 are out of the scale of 
this figure. In that case we find a c = 34.32 and M* ~ 1 for < a < 20. We compare 
the fixed-point behaviour found with self-control (solid lines) with the results obtained 
by choosing the threshold through the optimization of the mutual information function 
(dashed lines). Roughly speaking, self-control is the best choice for activities below 
0.05. For a above this value, but still small compared with a homogeneous distribution 
a = 1/2, e.g. a = 0.3, self-control continues to perform quite well, however it ceases to 
be better than optimization. 

Finally, we have studied the leading behaviour of the critical capacity in the limit 
a — > 0. We have found that a c (a) ~ (a| lna|) _1 . This is consistent with former studies 
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on other low activity models (see |T], [3] and references therein). Moreover, we remark 
that for a in the range (10 -4 , 10 -3 ) the proportionality coefficient seems to be constant 
and given by 0.25. 

5.2. Non-zero temperature 

Since self-control is completely autonomous and since it improves the retrieval quality 
also for not so sparse networks it is worth checking how it performs for non-zero 
temperatures. Also in this case we compare it with the optimal threshold for which 
we recall that it has to be calculated by hand when the network has reached equilibrium 
for every loading a and every inverse temperature j3. 

In fig. [5] we have studied the retrieval fixed-points of the main overlap as a function 
of the storage capacity for different values of the temperature and of the pattern activity. 
The results are plotted for a = 0.1 (fig. |5(a)| ) and a = 0.001 (fig. |5(b)| ) and increasing (3. 
The lines end at the critical capacity where a first-order transition to the non-retrieval 
phase occurs. At this point we recall that also for these non-zero temperatures in both 
cases the presence of a non-zero threshold is strictly necessary in order for the network 
to evolve toward the retrieval phase for these storage capacities. 

For (3 = 100 the results of the deterministic network are found back. For a = 0.1 we 
already know from the previous analysis at zero temperature that optimization works 
better than self-control. For a = 0.01 the reverse situation is valid. For smaller (3 and 
for smaller storage capacities self-control does not work as well. Optimization leads to 
a bigger value for the retrieval overlap than self-control does. 

For the lowest pattern activity a = 0.01 (fig. |5(b)|) self-control works worse for 



increasing temperature. The critical capacity of the network with self-control is smaller 
than the critical capacity obtained by optimization. In fact, for (3 = 3, 4 it is about 
half. For pattern activity below a = 0.01 the critical capacity with self-control becomes 
still smaller and it is smaller than the critical capacity obtained by optimization. 

We can then summarize the peculiar behaviour with self-control for small storage 
capacities as follows. We usually expect the retrieval fixed-points to have the greatest 
overlap values at zero storage capacity and then to slowly decrease until the critical 
capacity is reached, where there is a phase transition. This is, indeed, the behaviour at 
zero temperature with whatever choice of the threshold. At non-zero temperature this 
behaviour is found with the optimisation approach, with the self-control method the 
retrieval fixed-points obtain their maximal retrieval overlap not at zero capacity, but at 
a higher value. 

The analysis of the temperature-capacity phase diagram with self-control and 
optimization for different values of the pattern activity is summarized in fig. |6|. We 
discuss the results for decreasing a. For a = 0.1, fig. |6(a)| , the two methods give similar 



results except near zero temperature where the critical capacity with optimization 
is slightly bigger than with self-control, like we expect from the analysis at zero 
temperature. Decreasing the value of the pattern activity to a = 0.01, fig. |6(b)| , self- 
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control starts to work less good for a bigger region of high temperatures but it is better 
at lower temperatures. The curves in fig. |6(c)| show that at high temperature the region 
of retrieval with self-control becomes rather small when the activity is further lowered 
to a = 0.001. We also remark that for any choice of the pattern activity below 0.05 
there is a value of the temperature where the two curves intersect. This is consistent 
with the fact that at low activity in the limit of zero temperature self-control works 
better than opimization for a < 0.05. We conclude that, compared with optimization, 
self-control gives quite good results for activities in the range a G [0.01, 0.05]. When we 
want to consider lower activities (a = 0.001 and less) at arbitrary non-zero temperature 
self-control ceases to be a good method to control the noise during the dynamics of the 
network. In this case the temperature dependent externally chosen threshold optimizing 
the mutual information function leads to better retrieval qualities than the temperature 
independent self-control threshold. 



6. Concluding remarks 

In this paper we have studied the effects of a threshold in the gain function on the 
parallel dynamics in layered neural networks with variable activity. Such a threshold 
considerably enlarges the critical capacity of the network. Two different types of 
thresholds are considered. The first one forces the neural activity to be the same as 
the activity of the stored patterns at every step of the retrieval process and adapts itself 
for this purpose in the course of the time evolution. It provides a complete self-control 
mechanism. The second optimizes the mutual information function in equilibrium. It 
has to be given externally. For zero temperatures and low activity a < 0.5 it is found 
that self-control performs the best in considerably improving the storage capacity, the 
basin of attraction and the mutual information content, exactly as for extremely diluted 
models. And, in comparison with the optimization method, it even gives a comparable 
improvement for higher activities. Moreover, for non-zero temperatures self-control 
although being designed at temperature zero still gives quite good results for lower 
activities (a < 0.5) that are bigger than 0.01. Outside this region optimization done 
externally for every loading and every temperature leads to better overall retrieval 
qualities (except, obviously, near the critical capacity at zero temperature). It is worth 
studying whether self-control can still be improved by making it temperature dependent 
and/or whether optimization of the mutual information content can be done in a self- 
controlled way. 
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Figure 2. The evolution of the main overlap M(t) for several initial values Mq with 
qo = a = 0.001, a = 25 for the self-control model (a) and the optimal threshold model 
(b). 
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Figure 4. The retrieval fixed-points M* as a function of a for the self-control model 
(full line) and the optimal threshold model (dashed line) with decreasing pattern 
activity: a = 0.3, 0.1, 0.05, 0.03, 0.01 (from left to right). 
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Figure 5. The retrieval fixed-points M* as a function of a for several values of the 
inverse temperature for the self-control model (full line) and the optimal threshold 
model (dashed line) for a = 0.1 (a) and a = 0.01 (b). 



Thresholds in layered neural networks 



15 




Figure 6. The temperature-capacity phase diagram for the self-control model (full 
line) and the optimal threshold model (dashed-dotted line) for a = 0.1 (a), a = 0.01 
(b) and a = 0.001 (c). 



